

**This Page is Inserted by IFW Indexing and Scanning  
Operations and is not part of the Official Record**

**BEST AVAILABLE IMAGES**

Defective images within this document are accurate representations of the original documents submitted by the applicant.

Defects in the images include but are not limited to the items checked:

- BLACK BORDERS**
- IMAGE CUT OFF AT TOP, BOTTOM OR SIDES**
- FADED TEXT OR DRAWING**
- BLURRED OR ILLEGIBLE TEXT OR DRAWING**
- SKEWED/SLANTED IMAGES**
- COLOR OR BLACK AND WHITE PHOTOGRAPHS**
- GRAY SCALE DOCUMENTS**
- LINES OR MARKS ON ORIGINAL DOCUMENT**
- REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY**
- OTHER:** \_\_\_\_\_

**IMAGES ARE BEST AVAILABLE COPY.**

**As rescanning these documents will not correct the image problems checked, please do not report these problems to the IFW Image Problem Mailbox.**



Europäisches Patentamt

European Patent Office

Office européen des brevets

⑪ Publication number:

**0 219 203**

A2

⑫

## EUROPEAN PATENT APPLICATION

⑬ Application number: 86306269.1

⑮ Int. Cl. 4: **G 06 F 9/38**

⑭ Date of filing: 14.08.86

⑩ Priority: 30.08.85 US 771327

⑬ Applicant: ADVANCED MICRO DEVICES, INC.  
901 Thompson Place P.O. Box 3453  
Sunnyvale, CA 94088(US)

⑪ Date of publication of application:  
22.04.87 Bulletin 87/17

⑭ Inventor: Case, Brian  
575 Rengstorff 113  
Mt. View, CA 94040(US)

⑫ Designated Contracting States:  
AT BE CH DE FR GB IT LI LU NL SE

⑭ Inventor: Fleck, Rod  
928 Wright Avenue 304  
Mt. View, CA 94043(US)

⑭ Inventor: Moller, Ole  
Rormosen 310  
DK-2990 Nivaas(DK)

⑭ Inventor: Kong, Cheng-Gang  
2822 Agua Vista Drive  
San Jose, CA 95132(US)

⑭ Representative: Wright, Hugh Ronald et al,  
Brookes & Martin High Holborn House 52/54 High Holborn  
London WC1V 6SE(GB)

⑮ Computer control providing single-cycle branching.

⑯ An instruction processor suitable for use in a reduced instruction-set computer employing an instruction pipeline which performs conditional branching in a single processor cycle. The processor treats a branch condition as a normal instruction operand rather than a special case within a separate condition code register. The condition bit and the branch target address determine which instruction is to be fetched, the branch not taking effect until the next-following instruction is executed. In this manner, no replacement of the instruction which physically follows the branch instruction in the pipeline need be made, and the branch occurs within the single cycle of the pipeline allocated to it. A simple circuit implements this delayed-branch method. A computer incorporating the processor readily executes special-handling techniques for calls on subroutines, interrupts and traps.

EP U 219 203 A2

DIGITAL INSTRUCTION PROCESSOR CONTROL

5 This invention relates to method and apparatus for processing instructions for a digital computer, and more particularly, for processing branch instructions in a pipeline using only the single cycle allocated in the pipeline to the instruction without need for branch prediction or complex  
10 circuitry.

BACKGROUND OF THE INVENTION

Reduced instruction set computers (RISC) recognize the advantages of using simple decoding and the pipelined execution of instructions. Branch instructions are required in a computer  
15 to control the flow of instructions. A branch instruction in a pipelined computer will normally delay the pipeline until the instruction at the location to which the branch instruction transferred control, the "branch address", is fetched. As such, these instructions impede the normal pipelined flow of  
20 instructions. Known in the prior art are elaborate techniques which delay the effect of branches, "delayed branching", or predicting branches ahead of time and correcting for wrong predictions, or fetching multiple instructions until the direction of the branch is known.

25 Since most of these techniques are too complex for a RISC architecture, the delayed branch is chosen for it; the delayed branch allows RISCs to always fetch the (physically) next instruction during the execution of the current instruction. As most RISCs employ pipelining of instructions, in the prior  
30 art delayed branching requires two instruction processor clock cycles to execute a branch instruction. This disrupts the instruction pipeline. Complex circuitry was introduced into the prior art to eliminate such disruption. Since branch instructions occur frequently within the instruction stream,  
35 prior art computers were slower and more complex than desired.

0219203

2 Since calls on subroutines and interrupt and trap routines similarly involve branching, the time penalties incurred in the prior art RISCs are also present for these commonly-occurring procedures. Accordingly, there is a need for an instruction 5 processor suitable for use in a RISC which performs branches in a single cycle, and thus does not disrupt the instruction pipeline, while providing completely accurate branch prediction without requiring complex circuitry.

10 The instruction processor to be described provides a program counter for use in a pipelined RISC in which branch instructions include a bit stored in a general-purpose register, instead of a condition code register, which allows the branch 15 condition to be treated as a normal instruction operand, instead of as a special case within the condition code register. During the decode cycle of the branch instruction, the condition bit is fetched and the branch "target" address is computed by a separate relative address adder, or fetched from a register, depending on 20 the type of branch instruction being executed.

At the beginning of the execution cycle of the branch instruction, the condition bit and target address control which 25 the next-following instruction is executed. A multiplexer implements this control. In this manner, no replacement of the instruction which physically follows the branch instruction in the pipeline need be made, and thus the pipeline can execute at the maximum rate without interruption. Accordingly, the branch occurs within the single cycle of the pipeline allocated to it.

30 A computer incorporating the processor also readily executes special-handling techniques for calls on subroutines, interrupts and trap routines.

35 BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is a block diagram of an instruction processor employing a program counter unit of the present invention providing single-cycle branching;

0219203

3  
Fig. 2 is a timing diagram of the pipeline stages during processing of a branch instruction by the instruction processor of the present invention; and

5 Fig. 3 is a composite timing diagram of the pipeline stages during the processing of an interrupt routine by the instruction processor, and during return from the interrupt routine.

10 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT  
A program counter portion 10 of a digital processor control 10 providing the single-cycle branch capability of the present invention is illustrated in block diagram form in Fig. 1. An instruction register, not shown in Fig. 1, contains the

15 instruction which is currently being executed by processor 10. If this instruction calls for a branch to be executed, a branch condition will be stored in a predetermined bit position within a general-purpose register, not shown in Fig. 1, where it will be treated by the processor 10 as an instruction operand. During 20 the instruction decode cycle of the processor 10, when a branch instruction is being decoded, the branch condition will be retrieved from the general-purpose register and used during the instruction execution cycle of processor 10. The branch condition 25 and a target address, described below, are used to determine the location of the instruction to be executed next by processor 10. The simple logic illustrated in Fig. 1 permits an instruction pipeline to operate without disruption even in the case of an instruction calling for conditional branching.

25 The processor 10 executes branch instructions which can specify one of three branch destinations; a "relative or absolute branch address", an "indirect branch address", or a "trap address". A fourth "continue address" is simply that of the 30 next-following instruction which will be executed if a branch is not undertaken. The paths illustrated in Fig. 1 show only the data flow between the indicated elements and are capable of conducting several signals in parallel. Control signal paths are 35 also required, as will be appreciated by those skilled in the art, but are not shown in Fig. 1 because it is well known by

those skilled in the art how to effect control of the various illustrated elements.

With reference to Fig. 1, processor 10 includes a branch target address (BRN TGT) multiplexer/register 12 which receives 5 from a general-purpose register file, not shown in Fig. 1, via a data path 14 an address which contains the location to which a branch is to be made by processor 10, the so-called "indirect branch address". A second type of branch address, the so-called "relative or absolute branch address" is determined by an adder 10 16 which at a first input receives from the instruction register a branch displacement value via a data path 18. This value can be added to the address of the presently-executed instruction received at a second input to adder 16 via a data path 20, resulting in the relative branch address. Should an addition not 15 be performed, only the branch displacement address will be generated by the adder 16, resulting in the absolute branch address. The address generated by adder 16 is conducted via a data path 22 to a second input of BRN TGT register/multiplexer 12.

20 The branch target address selected by BRN TGT register/multiplexer 12 as determined by a control signal generated by processor 10 in accordance with the branch instruction executed by processor 10 is generated at an output thereof and conducted via data path 24 to a first input of a 25 multiplexer (MUX) 26 which has an output terminal connected via a data path 28 to an input terminal of an instruction cache 30. The instruction cache 30 contains a set of storage locations, 512 in the preferred embodiment, for storing sets of contiguous instructions which constitute a portion of the program being 30 currently executed by processor 10. Application of an address at the input terminal of cache 30 causes the instruction stored at that address to be conducted to the instruction register of processor 10 to become the next instruction to be executed thereby.

35 Also receiving the address generated by MUX 26 is a program

counter (PC) stack 32 comprising a decode PC register 34 to which is conducted via data path 28 the address generated by MUX 26, an execute PC register 36 to which is conducted via a data path 38 the contents of the decode PC register 34, and a store PC 5 register 40 to which is conducted via a data path 42 the contents of the execute PC register 36. The PC stack 32 implements a four-stage instruction address pipeline, as will be described below in connection with Fig. 2.

Also receiving the address generated by MUX 26 is an address 10 incrementer (+1) 44 which generates at an output the address applied to it via data path 28 incremented by 1, i.e., the continue address. A program counter (PROG CNT) register 46 receives the continue address generated by incrementer 44 via a data path 48 and stores this address. The continue address is 15 generated at an output of the PROG CNT register 46 and conducted via data path 20 to the second input of adder 16 and a second input of MUX 26.

The MUX 26 receives a control signal based on the branch condition, described above, to determine whether the branch 20 target address applied on data path 24 or the continue address applied on data path 20 will be applied to the instruction cache 30 to fetch the next instruction to be executed by processor 10. An instruction calling for a branch will be processed by processor 10 so that the branch does not occur until the 25 instruction following the branch instruction is executed. In this manner, the instruction pipeline, implemented by the PC stack pipeline 32, operates without interruption even when a branch instruction enters the pipeline, since no replacement of the instruction which normally follows the branch instruction 30 need be made in the pipeline. Accordingly, the branch occurs within the single cycle of the pipeline allocated to it, as will be described in connection with Fig. 2.

A four-stage pipeline is used by the processor 10 of the instant invention; an instruction fetch stage, an instruction 35 decode stage, an instruction execution stage, and a data storage

stage. The various stages of the instruction pipeline employed by processor 10 are shown in Fig. 2, illustrating the execution of a branch instruction.

By employing a so-called "delayed branching" technique, the 5 processor 10 can effect execution of a branch instruction in a single processor cycle without requiring complex logic circuitry. The space between the vertical dashed lines in Fig. 2 corresponds to a single processor cycle, each cycle having an equal duration. Shown extending from  $t_0$  to  $t_1$  during the 10 first cycle of the processor 10 is "BRANCH 1" in which a branch instruction is fetched from instruction cache 30 and stored in the instruction register of processor 10. Shown extending from  $t_1$  to  $t_2$  during the second cycle of the processor 10 "BRANCH 2" is the decoding of the branch instruction stored in the 15 instruction register. The branch condition needed by the instruction is retrieved from the general-purpose register described above and the branch target address specified by the instruction is determined, as described above, and conducted on data bus 24 to the first input of MUX 26.

20 During the execution cycle of the branch instruction extending from  $t_2$  to  $t_3$  "BRANCH 3", the condition causes processor 10 to generate a control signal which, in turn, causes MUX 26 to select either the branch target address or the continue address to be conducted to the instruction cache 30 for use in 25 fetching the next (logical) instruction. During the storeback cycle, extending from  $t_3$  to  $t_4$ , the instruction (physically) following the branch instruction is in the execute stage of processor 10.

As shown in Fig. 2, a branch delay instruction "DELAY" 30 physically follows the branch instruction and is always executed; the branch itself not occurring until after the branch delay instruction, whereupon the instruction to which the branch instruction passes control "TARGET" executes. In this manner, the processor 10 can always fetch the next instruction during the 35 execution of the current instruction, i.e., operate in a pipeline

mode without the need to interrupt the pipeline nor retract the fetching of an instruction. Accordingly, Fig. 2 indicates that during the second cycle "DELAY 1" processor 10 fetches from cache 30 the instruction physically following the branch instruction 5 whose fetching, decoding, execution and storeback cycles were just described. This instruction thus occupies the stage in the pipeline immediately following the branch instruction which preceded it. Hence, the decoding, execution and storeback cycles for the branch delay instruction "DELAY 2", "DELAY 3" and 10 "DELAY 4" will occur during the third, fourth and fifth cycles of processor 10 as shown in Fig. 2.

The instruction to which control passes by virtue of the branch instruction, "TARGET", will occupy the stage in the pipeline immediately following the delay instruction, as shown in 15 Fig. 2. Thus the fetching, decoding, execution and storeback cycles for the TARGET instruction "TARGET 1", "TARGET 2", "TARGET 3", and "TARGET 4", will occur during the fourth, fifth and sixth cycles of processor 10 as shown in Fig. 2. The serial connection of the decode PC register 34, the execute PC register 20 36 and the store PC register 40, clocked at the intervals  $t_0$ ,  $t_1$ , ...,  $t_n$  implement the pipeline described above by storing the addresses of instructions associated with the corresponding pipeline stages.

The processor 10 of the instant invention can execute a 25 branch which is called for by a call subroutine instruction by a modification of the delayed branching technique described above in connection with Fig. 2. The call subroutine instruction is fetched from the cache 30 and stored in the instruction register during the first cycle of processor 10 extending from  $t_0$  to  $t_1$ ; 30 denoted as "BRANCH 1" in Fig. 2. During the second cycle of the processor 10, the call subroutine instruction, "BRANCH 2", is decoded and the contents of the PROG CNT register 46 is generated on data path 20, and via MUX 26, onto data path 28 for entry into a data path pipeline, not shown in Fig. 1. During the 35 third cycle of the processor 10 "BRANCH 3", the contents of the

PROG CNT register are increased by four in an arithmetic logic unit of the processor 10, not shown in Fig. 2, to establish a return address from the subroutine. During the fourth cycle of the processor 10 "BRANCH 4", the return address is saved in a 5 general-purpose register. In all other respects, the technique for executing a call subroutine instruction is as described above in connection with Fig. 2 for executing a branch instruction, where the instruction to which control passes by virtue of the call subroutine instruction, "TARGET", will be the first 10 instruction of the subroutine.

As the processor 10 of the present invention is capable of servicing interrupts and traps, special consideration must be given to the occurrence of an interrupt or trap between a branch or a call subroutine instruction and the delayed-branch 15 instruction which follows it. In this case, the processor 10 must cause the delayed-branch instruction to be executed after return from the interrupt or trap routine, in addition to the target instruction to which control passes by virtue of the branch or call instruction. To assure this result when returning 20 from an interrupt or trap routine, processor 10 must execute two branches: a first branch causes execution of the branch delay instruction which was pre-empted by the occurrence of the interrupt or trap, and a second branch which causes execution of the target instruction which followed the delayed-branch 25 instruction.

The various stages of the instruction pipeline employed by processor 10 to effect execution of an interrupt or trap routine occurring between a branch or subroutine call instruction and the delayed-branch instruction which follows it are illustrated in 30 Fig. 3A. The pipeline stages employed by processor 10 to effect return from the interrupt or trap routine are illustrated in Fig. 3B. With reference to Fig. 3A, the operation of processor 10 is illustrated by an interrupt occurring at time  $t_1$ . Modifications to the latter procedure will not be described 35 herein as they can be provided by those skilled in the art. For

purposes of illustration, a shift instruction is shown as fetched from instruction cache 30 in the preceding cycle extending from  $t_0$  to  $t_1$ . The pipeline also contains, for purposes of illustration, a jump instruction, followed by an add instruction, 5 followed by the shift instruction. Since the jump instruction was executed just before occurrence of the interrupt and the add and shift instructions had yet to be executed, it will be necessary for processor 10 to return from the interrupt routine and then execute the add and shift instructions. The addresses 10 of these instructions must be saved before transfer to the interrupt routine. Accordingly, a "SAVE\_PC\_JUMP" instruction is indicated in Fig. 3A as fetched during the cycle extending from  $t_1$  to  $t_2$ , following occurrence of the interrupt. This will cause processor 10 to save the address of the branch delay instruction, 15 namely the add instruction which was to execute during the cycle extending from  $t_1$  to  $t_2$  following occurrence of the interrupt. The contents of the execute PC register 36 (Fig. 1) portion of the PC stack 32 will accordingly be saved. Also, the "SAVE\_PC\_JUMP" instruction will cause processor 10 to generate the 20 contents of the BRN TGT multiplexer/register 12 onto data path 24, and via MUX 26, onto data path 28 and therefrom to instruction cache 30. These contents being the address of the first instruction of the interrupt routine. As shown in Fig. 3A, processor 10 will fetch during the cycle extending from  $t_2$  to  $t_3$ , 25 a "SAVE\_PC" instruction, which will cause processor 10 to save the addresss of the target instruction, namely, the shift instruction, which would have normally followed the add instruction. The first instruction of the interrupt routine, designated the "INTERRUPT HANDLER" in Fig. 3A, will then be 30 fetched by processor 10 during the cycle extending from  $t_3$  to  $t_4$ . The decoding and execution of the shift and add instructions, respectively, are accordingly aborted as indicated in Fig. 3A by the designations "(ADD)" and "(SHIFT)" during the decoding and execution stages. During subsequent store back 35 stages, the "SAVE\_PC\_JUMP" and "SAVE\_PC" instructions cause

processor 10 to save the addresses of the add instruction and the shift instruction, as indicated in Fig. 3A.

With reference to Fig. 3B, the interrupt routine will complete by causing the processor 10 to fetch two jump indirect 5 instructions, which are shown in Fig. 3B as being decoded during the cycles extending from  $t_1'$  to  $t_1'$  and  $t_2'$  to  $t_2'$ . To return from the interrupt routine then, processor 10 will perform an indirect jump via the value saved by the "SAVE\_PC" instruction described in connection with Fig. 3A and will 10 accordingly fetch the add instruction from cache 30 during the cycle extending from  $t_1'$  to  $t_2'$  and will perform an indirect jump via the value saved by the "SAVE\_PC\_JUMP" instruction and will accordingly fetch the shift instruction from cache 30 during the cycle extending from  $t_2'$  to  $t_3'$  as shown in Fig. 3B. Thus, the 15 processor 10 will execute these instructions in their order of occurrence in the pipeline just prior to the occurrence of the interrupt.

CLAIMS

1        1. A digital instruction processor control which cyclically  
2        executes, in a single cycle, instructions from a set, including a  
3        plurality of plural-bit branch instructions, stored in an  
4        instruction cache having a plurality of locations each with a  
5        designator, said processor control comprising:

6                means for generating signals indicative of a "continue with  
7        next instruction" address;

8                means responsive to said continue address signals and to  
9        predetermined bit portions of said branch instructions for  
10      generating signals indicative of a "branch target" address; and

11                first multiplexer means having an output terminal connected  
12      to said instruction cache responsive to a control signal  
13      indicative of the contents of a predetermined "condition" bit  
14      portion of said branch instruction, to said branch target address  
15      signals applied to a first input terminal thereof and said  
16      continue address signals applied to a second input terminal  
17      thereof for selectively conducting to said output terminal one of  
18      said address signals indicative of a location within said  
19      instruction cache from which to fetch the next instruction to be  
20      processed by said instruction processor.

21        2. A digital instruction processor control according to  
22        claim 1 wherein said branch target address generating means  
23        comprises:

24                second multiplexer/register means having an output terminal  
25      connected to said first input terminal of said first multiplexer  
26      means responsive to a control signal indicative of which of said  
27      plurality of branch instructions is being executed, to signals  
28      applied to a first input terminal indicative of an "indirect  
29      address" determined by said branch instructions, and to signals  
30      applied to a second input terminal indicative of a "relative or  
31      absolute branch address" for selectively conducting to said  
32      output terminal one of said address signals indicative of said  
33      branch target address; and

adder means having an output terminal connected to said  
15 second input terminal of said second multiplexer/register means  
responsive to said control signal indicative of which of said  
17 plurality of branch instructions is being executed, to signals  
applied to a first input terminal indicative of a predetermined  
19 "branch displacement" portion of said branch instructions, and to  
said continue address signals applied to a second input terminal  
21 for selectively generating at said output terminal an  
arithmetical combination of said signals applied to said first  
23 and second input terminals.

1       3. A digital instruction processor control according to  
claim 1 further including means connected to said output terminal  
3 of said first multiplexer means for storing at least three  
signals indicative of instruction cache location designators,  
5 each instruction therein designated occupying a stage in an  
instruction "pipeline", for generating signals representative of  
7 said contents stored therein, and for updating the contents of  
said location designators stored therein so that said instruction  
9 cache location designator conducted by said first multiplexer  
during the preceding cycle of said instruction processor replaces  
11 the contents of a first storage location thereof, the instruction  
cache location designator stored in said first storage location  
13 replaces the contents of a second storage location, and the  
instruction cache location designator stored in said second  
15 storage location replaces the contents of a third storage  
location.

1       4. A digital instruction processor according to claim 3  
wherein said pipeline means comprises a first clocked register  
3 having an input terminal connected to said output terminal of  
said first multiplexer means and an output terminal, a second  
5 clocked register having an input terminal connected to said  
output terminal of said first register and an output terminal,  
7 and a third register having an input terminal connected to said

output terminal of said second register.

- 1        5. A method of performing branches in one cycle of a digital instruction processor control having a program counter which cyclically executes instructions from a set, including a plurality of branch instructions each determining a "branch condition", stored in an instruction cache having a plurality of locations each with a designator, comprising the steps of:
  - 7            a) fetching from said cache at the location designator specified by the contents of said program counter a branch instruction and storing said instruction in an instruction register;
  - 9            b) decoding said instruction stored in said instruction register;
  - 11            c) saving said branch condition determined by said instruction;
  - 13            d) determining a branch target address based on the information generated at decoding step (c), and the contents of said program counter;
  - 15            e) fetching an instruction from a location in said cache determined from said branch target address determined at step (d), and said branch condition information generated at step (c);
  - 17            and
  - 19            f) replacing the contents of the program counter with the address used to fetch said instruction at step (e) incremented by one.

- 1        6. A one-cycle branching method according to claim 5 further including a method for calling a procedure in one cycle wherein said instruction set further includes a procedure call instruction, wherein step (a) calls for fetching said procedure call instruction, said method further including the steps of:
  - 3            g) determining a call return address based on the information generated at decoding step (b) and the contents of said program counter; and

9       h) saving said call return address determined at step (g).

1       7. A method of processing an interrupt routine by a digital  
2       instruction processor control having an instruction pipeline  
3       which cyclically executes instructions from a set stored in an  
4       instruction cache having a plurality of locations each with a  
5       designator, comprising the steps of:

7       a) saving the location designator of the instruction placed  
in the pipeline two cycles prior to the occurrence of the  
interrupt;

9       b) saving the location designator of the instruction placed  
in the pipeline one cycle prior to the occurrence of the  
11      interrupt;

13      c) transferring control to the first instruction of said  
interrupt routine;

15      d) prior to returning from said interrupt routine fetching  
for said pipeline the instruction located at the location  
designator saved at step (a); and

17      e) prior to returning from said interrupt routine fetching  
for said pipeline the instruction located at the location  
designator saved at step (b).

1       8. An interrupt processing method according to claim 7  
wherein said instruction processor control has a program counter  
3       and a data path pipeline, said instruction set includes an  
interrupt procedure call instruction, and wherein transferring  
5       control step (c) comprises the steps of:

7       c1) fetching from said cache at the location designator  
specified by the contents of said program counter said interrupt  
procedure call instruction and storing said instruction in an  
9       instruction register;

11      c2) decoding said instruction stored in said instruction  
register;

13      c3) determining a branch target address based on the  
information generated at decoding step (c2), and the contents of

15        said program counter;  
15        c4) fetching an instruction from a location in said cache  
determined from said branch target address determined at step  
17        (c3);  
19        c5) replacing the contents of the program counter with the  
address used to fetch said instruction at step (c4) incremented  
by four; and  
21        c6) conducting the contents of said program counter to said  
data path pipeline for use in subsequent storage operations.

1        9. An interrupt processing method according to claim 7,  
further including the steps of:  
3        f) following return from said interrupt routine fetching  
from said cache at the location designator saved at step (a) and  
5        storing said instruction in an instruction register; and  
7        g) following return from said interrupt routine fetching  
from said cache at the location designator saved at step (b) and  
storing said instruction in an instruction register.

1        10. An interrupt processing method according to claim 9,  
wherein said instruction processor control has a program counter,  
3        said instruction set includes an interrupt procedure call  
instruction, and wherein fetching step (f) comprises  
5        the steps of:  
7        f1) fetching from said cache at the location designator  
specified by the contents of said program counter said indirect  
9        branch instruction and storing said instruction in an  
instruction register;  
11        f2) decoding said instruction stored in said instruction  
register;  
13        f3) determining an indirect branch address based on the  
information generated at decoding step (f2), and the contents of  
said program counter;  
15        f4) fetching an instruction from a location in said cache  
determined from said indirect branch address determined at step  
17        (f3); and  
c5) replacing the contents of the program counter with the

0219203

16

19 address used to fetch said instruction at step (f4) incremented  
by one.



FIG. 1

1/2

BRANCH / CALL:  
 CACHE FETCH      BRANCH 1      TARGET 1  
 INT DECODE      BRANCH 2      TARGET 2  
 EXECUTE      BRANCH 3      TARGET 3  
 STOREBACK      BRANCH 4      TARGET 4



FIG. 2

FIG. 3A



FIG. 3B

